Last updated: 2025-05-05

Checks: 5 2

Knit directory: PD1_mm/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20240223) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.

absolute relative
/home/hnatri/PD1_mm/ .
/home/hnatri/PD1_mm/code/utilities.R code/utilities.R
/home/hnatri/PD1_mm/code/PD1_mm_themes.R code/PD1_mm_themes.R
/home/hnatri/PD1_mm/code/CART_plot_functions.R code/CART_plot_functions.R
/home/hnatri/PD1_mm/PD1_CART_celltype_markers_top20.tsv PD1_CART_celltype_markers_top20.tsv
/home/hnatri/PD1_mm/analysis/comparative_analysis.Rmd analysis/comparative_analysis.Rmd

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 0bf2420. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .RData
    Ignored:    analysis/figure/

Untracked files:
    Untracked:  Rplots.pdf
    Untracked:  code/slurm.20555266.err
    Untracked:  code/slurm.20555266.out
    Untracked:  code/slurm.20555410.err
    Untracked:  code/slurm.20555410.out

Unstaged changes:
    Modified:   PD1_CART_celltype_markers_top20.tsv
    Modified:   analysis/CellChat.Rmd
    Modified:   analysis/annotate.Rmd
    Modified:   analysis/cleaner_plots.Rmd
    Modified:   analysis/comparative_analysis.Rmd
    Modified:   code/CellChat.Rout

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/comparative_analysis.Rmd) and HTML (docs/comparative_analysis.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd fb5a812 heinin 2025-04-21 Updated CellChat and GSEA, added plots without day 12
Rmd 7c73c80 heinin 2024-05-06 Added GO analysis
html 7c73c80 heinin 2024-05-06 Added GO analysis
html 8c61c94 heinin 2024-03-19 Build site.
html 46627b7 heinin 2024-03-19 Build site.
Rmd 0a6e0b7 heinin 2024-03-19 wflow_publish(c("analysis/comparative_analysis.Rmd", "analysis/index.Rmd"))
html 0a6e0b7 heinin 2024-03-19 wflow_publish(c("analysis/comparative_analysis.Rmd", "analysis/index.Rmd"))
Rmd cef63a7 heinin 2024-03-19 Added DEGs by timepoint
html cef63a7 heinin 2024-03-19 Added DEGs by timepoint
Rmd c63a16f heinin 2024-03-14 Changed Mono1 to M2, added an analysis file to look at exhaustion markers
html c63a16f heinin 2024-03-14 Changed Mono1 to M2, added an analysis file to look at exhaustion markers
Rmd f1e402a heinin 2024-03-13 Added Seurat DEGs
html f1e402a heinin 2024-03-13 Added Seurat DEGs
Rmd d62a214 heinin 2024-03-12 Saving DEG overlap tables
html d62a214 heinin 2024-03-12 Saving DEG overlap tables
Rmd 4cf95d7 heinin 2024-03-12 Top markers and DEGs for all timepoints combined
html 4cf95d7 heinin 2024-03-12 Top markers and DEGs for all timepoints combined
Rmd ea97b27 heinin 2024-02-27 Filtering, comparative analysis
html ea97b27 heinin 2024-02-27 Filtering, comparative analysis
Rmd 18c1bd4 heinin 2024-02-27 Cell type proportion testing
html 18c1bd4 heinin 2024-02-27 Cell type proportion testing
Rmd 3e207b9 heinin 2024-02-26 Added scImmuCC annotations
Rmd 196db6a heinin 2024-02-26 Starting the comparative analysis

Introduction

Comparing treatment groups/timepoints. Note: The DEGs were previously called with “data” as the assay, but the expression values were not normalized and the analysis was thus done on the raw counts (?), yielding more DEGs.

Packages and environment variables

suppressPackageStartupMessages({
  library(Seurat)
  library(SeuratObject)
  library(SeuratDisk)
  library(tidyverse)
  library(tibble)
  library(ggplot2)
  library(ggpubr)
  library(ggrepel)
  library(workflowr)
  library(googlesheets4)
  library(scProportionTest)
  library(UpSetR)})

setwd("/home/hnatri/PD1_mm/")
set.seed(9999)
options(ggrepel.max.overlaps = Inf)

# Colors, themes, cell type markers, and plot functions
source("/home/hnatri/PD1_mm/code/utilities.R")
source("/home/hnatri/PD1_mm/code/PD1_mm_themes.R")
source("/home/hnatri/PD1_mm/code/CART_plot_functions.R")

Importing data

# Previously PD1_mm_Seurat_merged.Rds, using most up to date object when
# replotting. PD1_mm_Seurat_GSEA.Rds has GSEA metadata for each cell.
seurat_data <- readRDS("/tgen_labs/banovich/BCTCSF/PD1_mm_Seurat/PD1_mm_Seurat_merged.Rds")
seurat_data$Treatment_Day <- paste0(seurat_data$Treatment, "_", seurat_data$Day)
seurat_data$celltype <- factor(seurat_data$celltype,
                               levels = sort(as.character(unique(seurat_data$celltype))))

# Updating annotations
gs4_deauth()
markers_annotations  <- gs4_get("https://docs.google.com/spreadsheets/d/1iWYBouwQlQboI-rwiujC0QKJ6lq9XeTffbKm2Nz8es0/edit?usp=sharin#g")
sheet_names(markers_annotations)
annotations <- read_sheet(markers_annotations, sheet = "Cluster annotations")

seurat_data$celltype <- mapvalues(seurat_data$snn_res.0.5,
                                  from = annotations$snn_res.0.5,
                                  to = annotations$annotation)

#saveRDS(seurat_data, "/tgen_labs/banovich/BCTCSF/PD1_mm_Seurat/PD1_mm_Seurat_merged.Rds")

# Normalizing all features in the Seurat object for plotting
#seurat_data <- NormalizeData(seurat_data, assay = "RNA", verbose = F)
#seurat_data <- NormalizeData(seurat_data, assay = "RNA_human", verbose = F)

#DefaultAssay(seurat_data) <- "RNA"
#VariableFeatures <- rownames(seurat_data)
#seurat_data <- ScaleData(seurat_data,
#                         vars.to.regress = c("percent.mt_RNA"),
#                         verbose = F)
#
#DefaultAssay(seurat_data) <- "RNA_human"
#VariableFeatures <- rownames(seurat_data)
#seurat_data <- ScaleData(seurat_data,
#                         vars.to.regress = c("percent.mt_RNA"),
#                         verbose = F)

#saveRDS(seurat_data, "/tgen_labs/banovich/BCTCSF/PD1_mm_Seurat/PD1_mm_Seurat_GSEA.Rds")
seurat_data <- readRDS("/tgen_labs/banovich/BCTCSF/PD1_mm_Seurat/PD1_mm_Seurat_GSEA.Rds")

# Updating annotations
gs4_deauth()
markers_annotations  <- gs4_get("https://docs.google.com/spreadsheets/d/1iWYBouwQlQboI-rwiujC0QKJ6lq9XeTffbKm2Nz8es0/edit?usp=sharin#g")
sheet_names(markers_annotations)
 [1] "Cluster top markers"                              
 [2] "Cluster annotations"                              
 [3] "scImmuCC"                                         
 [4] "Celltype top markers (all sig)"                   
 [5] "Celltype top 20 markers"                          
 [6] "GSEA myeloid"                                     
 [7] "GSEA myeloid by celltype"                         
 [8] "NEO vs. CTRL DEGs"                                
 [9] "NEO vs. CTRL, Seurat DEGs"                        
[10] "NEO vs. ADJ DEGs"                                 
[11] "NEO vs. ADJ, Seurat DEGs"                         
[12] "ADJ vs. CTRL, Seurat DEGs"                        
[13] "NEO vs. CTRL, NEO vs. ADJ overlapping DEGs"       
[14] "NEO vs. CTRL, NEO vs. ADJ overlapping Seurat DEGs"
[15] "Mm immune markers"                                
annotations <- read_sheet(markers_annotations, sheet = "Cluster annotations")

seurat_data$celltype <- mapvalues(seurat_data$snn_res.0.5,
                                  from = annotations$snn_res.0.5,
                                  to = annotations$annotation)

DimPlot of cell type annotations

#celltypes <- sort(as.character(unique(seurat_data$celltype)))
#PD1_Kluc_celltype_col <- colorRampPalette(brewer.pal(11, "Paired"))(length(celltypes))
#names(PD1_Kluc_celltype_col) <- celltypes

DimPlot(seurat_data,
        group.by = "celltype",
        reduction = "umap",
        raster = T,
        label = T,
        cols = PD1_Kluc_celltype_col) &
  coord_fixed(ratio = 1) &
  theme_bw() &
  NoLegend()

Version Author Date
7c73c80 heinin 2024-05-06
c63a16f heinin 2024-03-14
4cf95d7 heinin 2024-03-12
18c1bd4 heinin 2024-02-27
DimPlot(seurat_data,
        group.by = "celltype",
        split.by = "Treatment",
        reduction = "umap",
        raster = T,
        #label = T,
        cols = PD1_Kluc_celltype_col) &
  coord_fixed(ratio = 1) &
  theme_bw() &
  NoLegend()

Version Author Date
7c73c80 heinin 2024-05-06
c63a16f heinin 2024-03-14
4cf95d7 heinin 2024-03-12
18c1bd4 heinin 2024-02-27

Cell type marker expression with annotations

DefaultAssay(seurat_data) <- "RNA_human"

# Top markers for each cluster
markers <- presto::wilcoxauc(seurat_data,
                             group_by = "celltype",
                             assay = "data",
                             seurat_assay = "RNA_human")

top_markers <- markers %>% group_by(group) %>% slice_max(order_by = auc, n = 2)

FeaturePlot(seurat_data,
            features = top_markers$feature,
            ncol = 5,
            reduction = "umap",
            raster = T,
            cols = c("gray89", "tomato3")) &
  coord_fixed(ratio = 1) &
  theme_bw() &
  NoLegend() &
  manuscript_theme

Version Author Date
7c73c80 heinin 2024-05-06
c63a16f heinin 2024-03-14
4cf95d7 heinin 2024-03-12
top_markers <- markers %>%  group_by(group) %>% slice_max(order_by = auc, n = 5)

# seurat_object, plot_features, group_var, group_colors, column_title, km=5, row.order = NULL
DefaultAssay(seurat_data) <- "RNA_human"
dotplot_heatmap <- create_dotplot_heatmap_horizontal(seurat_object = seurat_data,
                                                     plot_features = unique(top_markers$feature),
                                                     group_var = "celltype",
                                                     group_colors = PD1_Kluc_celltype_col,
                                                     column_title = "",
                                                     km = 5, col.order = NULL)

Version Author Date
7c73c80 heinin 2024-05-06
f1e402a heinin 2024-03-13
4cf95d7 heinin 2024-03-12

Cell type proportion differences

create_barplot(subset(seurat_data, subset = Day == 16),
               group_var = "Treatment",
               plot_var = "celltype",
               plot_levels = sort((unique(seurat_data$celltype))),
               group_levels = c("CTRL", "ADJ", "NEO"),
               plot_colors = PD1_Kluc_celltype_col,
               var_names =  c("Frequency (%)", ""),
               legend_title = "Celltype")

Version Author Date
7c73c80 heinin 2024-05-06
cef63a7 heinin 2024-03-19

In scatter plots, the proportions of cell types in each pair of treatment groups are plotted against each other with one group on each axis. The forest plots show the significance level.

unique(seurat_data$Treatment)
[1] "ADJ"  "CTRL" "NEO" 
unique(seurat_data$Day)
[1] 16 12
table(seurat_data$celltype, seurat_data$Treatment)
       
         ADJ CTRL  NEO
  M1    1090 2163 1603
  M2     436 1444 2735
  M3     910 1634 1452
  L1     571 1420 1475
  M8     526 1498 1321
  L2    1117 1045 1063
  NK     579  992 1167
  M4     449 1117 1054
  L5     502  921 1107
  M5     252  665 1454
  Neut1  251 1619  498
  DC     379  773  772
  M6     132  763  590
  B1     311  568  553
  Treg   178  521  523
  L6     160  403  615
  ILC    171  320  347
  L3     269  305  225
  L4      50  154  371
  M7      20  281  185
create_clusterpropplot(seurat_data,
                       group_var = "Treatment",
                       group1 = "ADJ",
                       group2 = "CTRL",
                       plot_var = "celltype",
                       plot_colors = PD1_Kluc_celltype_col,
                       var_names = c("ADJ", "CTRL"),
                       legend_title = "Treatment")
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

Version Author Date
7c73c80 heinin 2024-05-06
46627b7 heinin 2024-03-19
create_clusterpropplot(seurat_data,
                       group_var = "Treatment",
                       group1 = "NEO",
                       group2 = "CTRL",
                       plot_var = "celltype",
                       plot_colors = PD1_Kluc_celltype_col,
                       var_names = c("NEO", "CTRL"),
                       legend_title = "Treatment")

Version Author Date
7c73c80 heinin 2024-05-06
46627b7 heinin 2024-03-19
create_clusterpropplot(seurat_data,
                       group_var = "Treatment",
                       group1 = "ADJ",
                       group2 = "NEO",
                       plot_var = "celltype",
                       plot_colors = PD1_Kluc_celltype_col,
                       var_names = c("ADJ", "NEO"),
                       legend_title = "Treatment")

Version Author Date
7c73c80 heinin 2024-05-06
# Day 12
create_clusterpropplot(subset(seurat_data, subset = Day == 12),
                       group_var = "Treatment",
                       group1 = "NEO",
                       group2 = "CTRL",
                       plot_var = "celltype",
                       plot_colors = PD1_Kluc_celltype_col,
                       var_names = c("NEO", "CTRL"),
                       legend_title = "Treatment, Day 12")

Version Author Date
7c73c80 heinin 2024-05-06
# Day 16
create_clusterpropplot(subset(seurat_data, subset = Day == 16),
                       group_var = "Treatment",
                       group1 = "ADJ",
                       group2 = "CTRL",
                       plot_var = "celltype",
                       plot_colors = PD1_Kluc_celltype_col,
                       var_names = c("ADJ", "CTRL"),
                       legend_title = "Treatment, Day 16")

Version Author Date
7c73c80 heinin 2024-05-06
create_clusterpropplot(subset(seurat_data, subset = Day == 16),
                       group_var = "Treatment",
                       group1 = "NEO",
                       group2 = "CTRL",
                       plot_var = "celltype",
                       plot_colors = PD1_Kluc_celltype_col,
                       var_names = c("NEO", "CTRL"),
                       legend_title = "Treatment, Day 16")

Version Author Date
7c73c80 heinin 2024-05-06
create_clusterpropplot(subset(seurat_data, subset = Day == 16),
                       group_var = "Treatment",
                       group1 = "ADJ",
                       group2 = "NEO",
                       plot_var = "celltype",
                       plot_colors = PD1_Kluc_celltype_col,
                       var_names = c("ADJ", "NEO"),
                       legend_title = "Treatment, Day 16")

Version Author Date
7c73c80 heinin 2024-05-06
# Using scProportionTest
prop_test <- sc_utils(seurat_data)

prop_test <- permutation_test(
  prop_test, cluster_identity = "celltype",
  sample_1 = "CTRL", sample_2 = "ADJ",
  sample_identity = "Treatment")

perm_plot <- permutation_plot(prop_test)

perm_plot + scale_colour_manual(values = c("tomato", "azure2")) +
  #NoLegend() +
  ggtitle("ADJ vs. CTRL, all timepoints")
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.

Version Author Date
7c73c80 heinin 2024-05-06
prop_test <- permutation_test(
  prop_test, cluster_identity = "celltype",
  sample_1 = "CTRL", sample_2 = "NEO",
  sample_identity = "Treatment")

perm_plot <- permutation_plot(prop_test)

perm_plot + scale_colour_manual(values = c("tomato", "azure2")) +
  #NoLegend() +
  ggtitle("NEO vs. CTRL, all timepoints")
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.

Version Author Date
7c73c80 heinin 2024-05-06
prop_test <- permutation_test(
  prop_test, cluster_identity = "celltype",
  sample_1 = "ADJ", sample_2 = "NEO",
  sample_identity = "Treatment")

perm_plot <- permutation_plot(prop_test)

perm_plot + scale_colour_manual(values = c("tomato", "azure2")) +
  #NoLegend() +
  ggtitle("NEO vs. ADJ, all timepoints")
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.

Version Author Date
7c73c80 heinin 2024-05-06
# Day 12 only
# For day 12, no ADJ sample
prop_test <- sc_utils(subset(seurat_data, subset = Day == 12))

prop_test <- permutation_test(
  prop_test, cluster_identity = "celltype",
  sample_1 = "CTRL", sample_2 = "NEO",
  sample_identity = "Treatment")

perm_plot <- permutation_plot(prop_test)

perm_plot + scale_colour_manual(values = c("tomato", "azure2")) +
  #NoLegend() +
  ggtitle("NEO vs. CTRL, day 12")
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.

Version Author Date
7c73c80 heinin 2024-05-06
# Day 16 only
prop_test <- sc_utils(subset(seurat_data, subset = Day == 16))

prop_test <- permutation_test(
  prop_test, cluster_identity = "celltype",
  sample_1 = "CTRL", sample_2 = "ADJ",
  sample_identity = "Treatment")

perm_plot <- permutation_plot(prop_test)

perm_plot + scale_colour_manual(values = c("tomato", "azure2")) +
  #NoLegend() +
  ggtitle("ADJ vs. CTRL, day 16")
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.

Version Author Date
7c73c80 heinin 2024-05-06
prop_test <- permutation_test(
  prop_test, cluster_identity = "celltype",
  sample_1 = "CTRL", sample_2 = "NEO",
  sample_identity = "Treatment")

perm_plot <- permutation_plot(prop_test)

perm_plot + scale_colour_manual(values = c("tomato", "azure2")) +
  #NoLegend() +
  ggtitle("NEO vs. CTRL, day 16")
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.

Version Author Date
7c73c80 heinin 2024-05-06
prop_test <- permutation_test(
  prop_test, cluster_identity = "celltype",
  sample_1 = "ADJ", sample_2 = "NEO",
  sample_identity = "Treatment")

perm_plot <- permutation_plot(prop_test)

perm_plot + scale_colour_manual(values = c("tomato", "azure2")) +
  #NoLegend() +
  ggtitle("NEO vs. ADJ, day 16")
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.

Version Author Date
7c73c80 heinin 2024-05-06

Cluster markers

DefaultAssay(seurat_data) <- "RNA_human"

# Dropping MT and RP genes before calling markers
RBMTgenes <- grep(pattern = "^RP[SL]|^MRP[SL]|^MT-",
                  x = rownames(seurat_data@assays$RNA_human@data),
                  value = TRUE, invert = TRUE)
#seurat_data <- subset(seurat_data, features = RBMTgenes)

# Top markers for each cluster
markers <- presto::wilcoxauc(seurat_data,
                             group_by = "celltype",
                             assay = "data",
                             seurat_assay = "RNA_human")

sig_markers <- markers %>% filter(auc>0.8, padj<0.01)

dim(sig_markers)
[1] 2315   10
table(sig_markers$group)

   B1    DC   ILC    L1    L2    L3    L5    L6    M1    M2    M3    M7    M8 
   28    43    74    10    33  1091    26    74    74    37   135   488    53 
Neut1    NK  Treg 
   15    20   114 
table(sig_markers$group) %>% as.data.frame() %>%
  ggplot(aes(x = reorder(Var1, -Freq), y = Freq, fill = Var1)) +
    geom_bar(stat = "identity") +
    scale_fill_manual(values = PD1_Kluc_celltype_col) +
    theme_classic2() +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
    NoLegend() +
    xlab("Cell type") +
    ylab("# cell type markers")

Version Author Date
7c73c80 heinin 2024-05-06
46627b7 heinin 2024-03-19
# Saving to a file
write.table(sig_markers, "/scratch/hnatri/CART/PD1_CART_celltype_markers.tsv",
            sep = "\t", quote = F, row.names = F)

# Top markers for each cluster
top_markers <- markers %>% group_by(group) %>% slice_max(order_by = auc, n = 20)

write.table(top_markers, "/home/hnatri/PD1_mm/PD1_CART_celltype_markers_top20.tsv",
            quote = F, row.names = F, sep = "\t")

DEGs between treatment groups

# For each cluster, top DEGs between NEO and CTRL
DEG_NEO_CTRL <- lapply(unique(seurat_data$celltype), function(xx){
  data_subset <- subset(seurat_data, subset = celltype == xx)
  if (all((c("NEO", "CTRL") %in% data_subset$Treatment) == c(T, T))){
    markers <- presto::wilcoxauc(data_subset,
                                 group_by = "Treatment",
                                 groups_use = c("NEO", "CTRL"),
                                 assay = "data",
                                 seurat_assay = "RNA_human")
    markers$celltype <- xx
    
    return(markers)
  } else {
    return(NULL)
  }
})
names(DEG_NEO_CTRL) <- unique(seurat_data$celltype)
DEG_NEO_CTRL[sapply(DEG_NEO_CTRL, is.null)] <- NULL

DEG_NEO_CTRL_df <- as.data.frame(do.call(rbind, DEG_NEO_CTRL))

# Distribution of log2FC
hist(DEG_NEO_CTRL_df$pval)

Version Author Date
7c73c80 heinin 2024-05-06
hist(DEG_NEO_CTRL_df$logFC)

DEG_NEO_CTRL_df_sig <- DEG_NEO_CTRL_df %>%
  filter(group=="NEO",
         padj < 0.01,
         abs(logFC) > 10,
         (pct_in > 50 | pct_out > 50))

# Saving to a file
write.table(DEG_NEO_CTRL_df_sig,
            "/scratch/hnatri/CART/PD1_CART_DEG_NEO_CTRL_sig.tsv",
            sep = "\t", quote = F, row.names = F)

# Comparing to the Seurat function
DEG_NEO_CTRL_Seurat <- lapply(unique(seurat_data$celltype), function(xx){
  data_subset <- subset(seurat_data, subset = celltype == xx)
  Idents(data_subset) <- data_subset$Treatment
  if (all((c("NEO", "CTRL") %in% data_subset$Treatment) == c(T, T))){
    markers <- FindMarkers(data_subset,
                           ident.1 = "NEO",
                           ident.2 = "CTRL",
                           assay = "RNA_human",
                           verbose = F)
    markers$feature <- rownames(markers)
    markers$celltype <- xx
    
    return(markers)
  } else {
    return(NULL)
  }
})
names(DEG_NEO_CTRL_Seurat) <- unique(seurat_data$celltype)
DEG_NEO_CTRL_Seurat[sapply(DEG_NEO_CTRL_Seurat, is.null)] <- NULL

DEG_NEO_CTRL_Seurat_df <- as.data.frame(do.call(rbind, DEG_NEO_CTRL_Seurat))

# Distribution of log2FC
hist(DEG_NEO_CTRL_Seurat_df$avg_log2FC)

DEG_NEO_CTRL_Seurat_df_sig <- DEG_NEO_CTRL_Seurat_df %>%
  filter(p_val_adj < 0.01,
         abs(avg_log2FC) > 100,
         (pct.1 > 0.50 | pct.2 > 0.50))

# Saving to a file
write.table(DEG_NEO_CTRL_Seurat_df_sig,
            "/scratch/hnatri/CART/PD1_CART_DEG_NEO_CTRL_Seurat_sig.tsv",
            sep = "\t", quote = F, row.names = F)

# DEGs shared between and unique to each method
length(intersect(DEG_NEO_CTRL_Seurat_df_sig$feature, DEG_NEO_CTRL_df_sig$feature))
[1] 36
length(setdiff(DEG_NEO_CTRL_Seurat_df_sig$feature, DEG_NEO_CTRL_df_sig$feature))
[1] 70
length(setdiff(DEG_NEO_CTRL_df_sig$feature, DEG_NEO_CTRL_Seurat_df_sig$feature))
[1] 9
#hist(DEG_NEO_CTRL_Seurat_df_sig$avg_log2FC)
#hist(DEG_NEO_CTRL_df_sig$logFC)
#
#hist(DEG_NEO_CTRL_Seurat_df_sig$p_val_adj)
#hist(DEG_NEO_CTRL_df_sig$padj)

# Comparing logFC values
merge(DEG_NEO_CTRL_Seurat_df, DEG_NEO_CTRL_df, by = c("feature", "celltype")) %>%
  ggplot(aes(x = avg_log2FC, y = logFC)) +
    geom_point() +
    theme_classic2()

# Comparing logFC values for significant DEGs only
merge(DEG_NEO_CTRL_Seurat_df_sig, DEG_NEO_CTRL_df_sig, by = c("feature", "celltype")) %>%
  ggplot(aes(x = avg_log2FC, y = logFC)) +
    geom_point() +
    theme_classic2()

# Comparing adjusted p-values
merge(DEG_NEO_CTRL_Seurat_df, DEG_NEO_CTRL_df, by = c("feature", "celltype")) %>%
  ggplot(aes(x = p_val_adj, y = padj)) +
    geom_point() +
    theme_classic2()

# Comparing adjusted p-values for significant DEGs
merge(DEG_NEO_CTRL_Seurat_df_sig, DEG_NEO_CTRL_df_sig, by = c("feature", "celltype")) %>%
  ggplot(aes(x = p_val_adj, y = padj)) +
    geom_point() +
    theme_classic2()

# Plotting numbers of DEGs
table(DEG_NEO_CTRL_df_sig$celltype) %>% as.data.frame() %>%
  ggplot(aes(x = reorder(Var1, -Freq), y = Freq, fill = Var1)) +
    geom_bar(stat = "identity") +
    scale_fill_manual(values = PD1_Kluc_celltype_col) +
    theme_classic2() +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
    NoLegend() +
    xlab("Cell type") +
    ylab("# DEGs, NEO vs. CTRL, all timepoints")

table(DEG_NEO_CTRL_Seurat_df_sig$celltype) %>% as.data.frame() %>%
  ggplot(aes(x = reorder(Var1, -Freq), y = Freq, fill = Var1)) +
    geom_bar(stat = "identity") +
    scale_fill_manual(values = PD1_Kluc_celltype_col) +
    theme_classic2() +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
    NoLegend() +
    xlab("Cell type") +
    ylab("# DEGs, NEO vs. CTRL, all timepoints, Seurat")

# Top DEGs for each cluster
DEG_NEO_CTRL_df_sig %>% group_by(group) %>% slice_max(order_by = auc, n = 10)
# A tibble: 10 × 11
# Groups:   group [1]
   feature group avgExpr logFC statistic   auc      pval      padj pct_in
   <chr>   <chr>   <dbl> <dbl>     <dbl> <dbl>     <dbl>     <dbl>  <dbl>
 1 CMSS1   NEO      58.8  35.9    61586  0.897 3.14e- 55 5.32e- 51  100  
 2 CMSS1   NEO      44.1  25.8   982892  0.885 1.23e-205 2.08e-201   99.8
 3 LARS2   NEO      15.7  10.2    60629  0.883 1.07e- 51 9.05e- 48  100  
 4 CMSS1   NEO      20.8  11.6  2089805  0.881 2.91e-293 4.92e-289   99.1
 5 GPHN    NEO      29.5  17.1    59892. 0.873 7.99e- 49 4.51e- 45  100  
 6 CMSS1   NEO      28.6  14.2   496041  0.831 1.30e-112 2.20e-108   99.9
 7 CMSS1   NEO      30.5  16.2   959240. 0.829 3.80e-153 6.44e-149   99.9
 8 CAMK1D  NEO      23.7  13.4    56616  0.825 1.48e- 37 6.26e- 34  100  
 9 CMSS1   NEO      33.9  15.9   840013  0.824 1.13e-139 1.91e-135   99.4
10 CMSS1   NEO      26.3  14.6   652640  0.809 2.86e- 97 4.84e- 93   99.6
# ℹ 2 more variables: pct_out <dbl>, celltype <fct>
DEG_NEO_CTRL_Seurat_df_sig %>% group_by(celltype) %>% slice_max(order_by = avg_log2FC, n = 10)
# A tibble: 88 × 7
# Groups:   celltype [14]
      p_val avg_log2FC pct.1 pct.2 p_val_adj feature celltype
      <dbl>      <dbl> <dbl> <dbl>     <dbl> <chr>   <fct>   
 1 7.76e-17       Inf  0.996 0.997  1.31e-12 CTSD    M1      
 2 5.76e-27       342. 0.992 0.988  9.76e-23 RPS29   M1      
 3 5.49e-13       326. 0.991 0.989  9.31e- 9 RPS21   M1      
 4 2.70e-25       275. 0.983 0.975  4.57e-21 RPS27   M1      
 5 7.52e-18       236. 0.99  0.989  1.27e-13 RPL37A  M1      
 6 3.55e- 9       236. 0.981 0.978  6.02e- 5 RPL34   M1      
 7 3.36e-24       213. 0.976 0.979  5.70e-20 RPS28   M1      
 8 7.18e-12       203. 0.979 0.98   1.22e- 7 RPL39   M1      
 9 3.14e- 9       189. 0.984 0.983  5.32e- 5 RPL37   M1      
10 5.10e-20       156. 0.974 0.969  8.64e-16 RPL38   M1      
# ℹ 78 more rows
# For each cluster, top DEGs between NEO and ADJ
DEG_NEO_ADJ <- lapply(unique(seurat_data$celltype), function(xx){
  data_subset <- subset(seurat_data, subset = celltype == xx)
  if (all((c("NEO", "ADJ") %in% data_subset$Treatment) == c(T, T))){
    markers <- presto::wilcoxauc(data_subset,
                                 group_by = "Treatment",
                                 groups_use = c("NEO", "ADJ"),
                                 assay = "data",
                                 seurat_assay = "RNA_human")
    markers$celltype <- xx
    
    return(markers)
  } else {
    return(NULL)
  }
})
names(DEG_NEO_ADJ) <- unique(seurat_data$celltype)
DEG_NEO_ADJ[sapply(DEG_NEO_ADJ, is.null)] <- NULL

DEG_NEO_ADJ_df <- as.data.frame(do.call(rbind, DEG_NEO_ADJ))

DEG_NEO_ADJ_df_sig <- DEG_NEO_ADJ_df %>%
  filter(group == "NEO",
         padj < 0.01,
         auc > 0.6,
         (pct_in > 50 | pct_out > 50))

# Saving to a file
write.table(DEG_NEO_ADJ_df_sig,
            "/scratch/hnatri/CART/PD1_CART_DEG_NEO_ADJ_sig.tsv",
            sep = "\t", quote = F, row.names = F)

# Calling DEGs with Seurat
DEG_NEO_ADJ_Seurat <- lapply(unique(seurat_data$celltype), function(xx){
  data_subset <- subset(seurat_data, subset = celltype == xx)
  Idents(data_subset) <- data_subset$Treatment
  if (all((c("NEO", "ADJ") %in% data_subset$Treatment) == c(T, T))){
    markers <- FindMarkers(data_subset,
                           ident.1 = "NEO",
                           ident.2 = "ADJ",
                           assay = "RNA_human",
                           verbose = F)
    markers$feature <- rownames(markers)
    markers$celltype <- xx
    
    return(markers)
  } else {
    return(NULL)
  }
})
names(DEG_NEO_ADJ_Seurat) <- unique(seurat_data$celltype)
DEG_NEO_ADJ_Seurat[sapply(DEG_NEO_ADJ_Seurat, is.null)] <- NULL

DEG_NEO_ADJ_Seurat_df <- as.data.frame(do.call(rbind, DEG_NEO_ADJ_Seurat))

# Distribution of log2FC
hist(DEG_NEO_ADJ_Seurat_df$avg_log2FC)

DEG_NEO_ADJ_Seurat_df_sig <- DEG_NEO_ADJ_Seurat_df %>%
  filter(p_val_adj < 0.01,
         abs(avg_log2FC) > 100,
         (pct.1 > 0.50 | pct.2 > 0.50))

# Saving to a file
write.table(DEG_NEO_ADJ_Seurat_df_sig,
            "/scratch/hnatri/CART/PD1_CART_DEG_NEO_ADJ_Seurat_sig.tsv",
            sep = "\t", quote = F, row.names = F)

# Plotting
table(DEG_NEO_ADJ_df_sig$celltype) %>% as.data.frame() %>%
  ggplot(aes(x = reorder(Var1, -Freq), y = Freq, fill = Var1)) +
    geom_bar(stat = "identity") +
    scale_fill_manual(values = PD1_Kluc_celltype_col) +
    theme_classic2() +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
    NoLegend() +
    xlab("Cell type") +
    ylab("# DEGs, NEO vs. ADJ, all timepoints")

table(DEG_NEO_ADJ_Seurat_df_sig$celltype) %>% as.data.frame() %>%
  ggplot(aes(x = reorder(Var1, -Freq), y = Freq, fill = Var1)) +
    geom_bar(stat = "identity") +
    scale_fill_manual(values = PD1_Kluc_celltype_col) +
    theme_classic2() +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
    NoLegend() +
    xlab("Cell type") +
    ylab("# DEGs, NEO vs. ADJ, all timepoints, Seurat")

# Top DEGs for each cluster
DEG_NEO_ADJ_df_sig %>% group_by(group) %>% slice_max(order_by = auc, n = 10)
# A tibble: 10 × 11
# Groups:   group [1]
   feature group avgExpr logFC statistic   auc      pval      padj pct_in
   <chr>   <chr>   <dbl> <dbl>     <dbl> <dbl>     <dbl>     <dbl>  <dbl>
 1 LARS2   NEO     15.7   8.61    50174  0.829 1.65e- 36 2.80e- 32  100  
 2 GPHN    NEO     16.1   7.16   566186  0.815 1.84e- 99 3.12e- 95   99.8
 3 GPHN    NEO     29.5  13.7     48887  0.808 4.29e- 32 3.64e- 28  100  
 4 LARS2   NEO      9.96  4.93   945333  0.796 2.68e-127 4.55e-123   99.5
 5 LARS2   NEO      8.53  4.15   551043  0.793 1.06e- 86 8.99e- 83   99.5
 6 LARS2   NEO      9.55  4.33   428586  0.771 1.55e- 68 2.63e- 64   98.6
 7 LARS2   NEO      6.01  3.39   364471  0.770 1.68e- 62 2.85e- 58   96.4
 8 LARS2   NEO      6.97  3.45   132052  0.768 1.99e- 39 3.38e- 35   99.3
 9 GPHN    NEO     18.6   7.56   910148. 0.767 4.19e-103 3.55e- 99   99.5
10 LARS2   NEO      7.71  3.53   513471  0.760 1.28e- 70 2.18e- 66   99.5
# ℹ 2 more variables: pct_out <dbl>, celltype <fct>
DEG_NEO_ADJ_Seurat_df_sig %>% group_by(celltype) %>% slice_max(order_by = avg_log2FC, n = 10)
# A tibble: 133 × 7
# Groups:   celltype [17]
      p_val avg_log2FC pct.1 pct.2 p_val_adj feature celltype
      <dbl>      <dbl> <dbl> <dbl>     <dbl> <chr>   <fct>   
 1 8.83e-17       Inf  0.998 0.996  1.50e-12 FTH1    M1      
 2 1.11e-16       Inf  0.946 0.971  1.89e-12 LYZ     M1      
 3 2.28e- 8       261. 0.99  0.993  3.87e- 4 RPL37A  M1      
 4 3.68e-10       190. 0.991 0.994  6.24e- 6 FCER1G  M1      
 5 1.16e- 7       157. 0.984 0.985  1.97e- 3 RPL37   M1      
 6 1.34e- 7       139. 0.987 0.993  2.27e- 3 RPS16   M1      
 7 1.63e- 8       139. 0.974 0.981  2.77e- 4 RPL38   M1      
 8 7.49e- 9       129. 0.992 0.996  1.27e- 4 RPL23   M1      
 9 6.19e-11       118. 0.985 0.995  1.05e- 6 RPS11   M1      
10 1.16e- 7       115. 0.988 0.99   1.96e- 3 RPS23   M1      
# ℹ 123 more rows
# For each cluster, top DEGs between ADJ and CTRL
DEG_ADJ_CTRL <- lapply(unique(seurat_data$celltype), function(xx){
  data_subset <- subset(seurat_data, subset = celltype == xx)
  if (all((c("ADJ", "CTRL") %in% data_subset$Treatment) == c(T, T))){
    markers <- presto::wilcoxauc(data_subset,
                                 group_by = "Treatment",
                                 groups_use = c("ADJ", "CTRL"),
                                 assay = "data",
                                 seurat_assay = "RNA_human")
    #Idents(data_subset) <- data_subset$Treatment
    #markers <- FindMarkers(data_subset,
    #                       group.by = "celltype",
    #                       ident.1 = "ADJ",
    #                       ident.2 = "CTRL")
    markers$celltype <- xx
    
    return(markers)
  } else {
    return(NULL)
  }
})
names(DEG_ADJ_CTRL) <- unique(seurat_data$celltype)
DEG_ADJ_CTRL[sapply(DEG_ADJ_CTRL, is.null)] <- NULL

DEG_ADJ_CTRL_df <- as.data.frame(do.call(rbind, DEG_ADJ_CTRL))

DEG_ADJ_CTRL_df_sig <- DEG_ADJ_CTRL_df %>%
  filter(group == "NEO",
         padj < 0.01,
         auc > 0.6,
         (pct_in > 50 | pct_out > 50))

# Saving to a file
write.table(DEG_ADJ_CTRL_df_sig,
            "/scratch/hnatri/CART/PD1_CART_DEG_ADJ_CTRL_sig.tsv",
            sep = "\t", quote = F, row.names = F)

# Comparing to the Seurat function
DEG_ADJ_CTRL_Seurat <- lapply(unique(seurat_data$celltype), function(xx){
  data_subset <- subset(seurat_data, subset = celltype == xx)
  Idents(data_subset) <- data_subset$Treatment
  if (all((c("ADJ", "CTRL") %in% data_subset$Treatment) == c(T, T))){
    markers <- FindMarkers(data_subset,
                           ident.1 = "ADJ",
                           ident.2 = "CTRL",
                           assay = "RNA_human",
                           verbose = F)
    markers$feature <- rownames(markers)
    markers$celltype <- xx
    
    return(markers)
  } else {
    return(NULL)
  }
})
names(DEG_ADJ_CTRL_Seurat) <- unique(seurat_data$celltype)
DEG_ADJ_CTRL_Seurat[sapply(DEG_ADJ_CTRL_Seurat, is.null)] <- NULL

DEG_ADJ_CTRL_Seurat_df <- as.data.frame(do.call(rbind, DEG_ADJ_CTRL_Seurat))

# Distribution of log2FC
hist(DEG_ADJ_CTRL_Seurat_df$avg_log2FC)

DEG_ADJ_CTRL_Seurat_df_sig <- DEG_ADJ_CTRL_Seurat_df %>%
  filter(p_val_adj < 0.01,
         abs(avg_log2FC) > 100,
         (pct.1 > 0.50 | pct.2 > 0.50))

# Saving to a file
write.table(DEG_ADJ_CTRL_Seurat_df_sig,
            "/scratch/hnatri/CART/PD1_CART_DEG_ADJ_CTRL_Seurat_sig.tsv",
            sep = "\t", quote = F, row.names = F)

# Plotting
table(DEG_ADJ_CTRL_df_sig$celltype) %>% as.data.frame() %>%
  ggplot(aes(x = reorder(Var1, -Freq), y = Freq, fill = Var1)) +
    geom_bar(stat = "identity") +
    scale_fill_manual(values = PD1_Kluc_celltype_col) +
    theme_classic2() +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
    NoLegend() +
    xlab("Cell type") +
    ylab("# DEGs, ADJ vs. CTRL, all timepoints, Presto")

table(DEG_ADJ_CTRL_Seurat_df_sig$celltype) %>% as.data.frame() %>%
  ggplot(aes(x = reorder(Var1, -Freq), y = Freq, fill = Var1)) +
    geom_bar(stat = "identity") +
    scale_fill_manual(values = PD1_Kluc_celltype_col) +
    theme_classic2() +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
    NoLegend() +
    xlab("Cell type") +
    ylab("# DEGs, ADJ vs. CTRL, all timepoints, Seurat")

# Top DEGs for each cluster
DEG_ADJ_CTRL_df_sig %>% group_by(group) %>% slice_max(order_by = auc, n = 10)
# A tibble: 0 × 11
# Groups:   group [0]
# ℹ 11 variables: feature <chr>, group <chr>, avgExpr <dbl>, logFC <dbl>,
#   statistic <dbl>, auc <dbl>, pval <dbl>, padj <dbl>, pct_in <dbl>,
#   pct_out <dbl>, celltype <fct>
DEG_ADJ_CTRL_Seurat_df_sig %>% group_by(celltype) %>% slice_max(order_by = avg_log2FC, n = 10)
# A tibble: 115 × 7
# Groups:   celltype [18]
      p_val avg_log2FC pct.1 pct.2 p_val_adj feature celltype
      <dbl>      <dbl> <dbl> <dbl>     <dbl> <chr>   <fct>   
 1 7.63e-13       Inf  1     0.997  1.29e- 8 CTSD    M1      
 2 5.01e-17       203. 0.993 0.988  8.49e-13 RPS24   M1      
 3 1.21e-28       190. 0.994 0.991  2.05e-24 EEF1A1  M1      
 4 1.07e-15       181. 0.994 0.988  1.81e-11 RPS20   M1      
 5 2.83e-22       164. 0.994 0.993  4.79e-18 RPLP1   M1      
 6 1.68e- 9       152. 1     0.999  2.84e- 5 COX3    M1      
 7 1.15e-20       144. 0.994 0.983  1.94e-16 RPS10   M1      
 8 4.48e-19       137. 0.99  0.982  7.60e-15 RPL19   M1      
 9 2.44e-19       129. 0.994 0.986  4.14e-15 RPS12   M1      
10 3.86e-17       122. 0.992 0.982  6.54e-13 RPL32   M1      
# ℹ 105 more rows
# Overlap of DEGs between NEO vs. CTRL and NEO vs. ADJ
intersect(DEG_NEO_ADJ_df_sig$feature, DEG_NEO_CTRL_df_sig$feature)
 [1] "CAMK1D" "LARS2"  "GPHN"   "CMSS1"  "RPL37A" "RPS21"  "RPS27"  "RPL38" 
 [9] "RPS29"  "RPL37"  "RPS28"  "RPL39"  "MALAT1" "RPL32"  "RPL13"  "RPSA"  
[17] "RPS23"  "RPS24"  "FAU"   
inner_join(DEG_NEO_ADJ_df_sig, DEG_NEO_CTRL_df_sig, by = c('feature','celltype'))
   feature group.x avgExpr.x   logFC.x statistic.x     auc.x       pval.x
1    CMSS1     NEO  44.07056 12.263930    820416.0 0.6909517 9.241756e-54
2    RPS29     NEO  38.37190 11.712560    904754.0 0.6847350 9.447951e-52
3    CMSS1     NEO  20.84229  6.881847    975180.5 0.7380351 8.505467e-85
4   CAMK1D     NEO  23.73778 10.812127     45402.0 0.7501363 9.010737e-22
5    LARS2     NEO  15.74222  8.608393     50174.0 0.8289798 1.652220e-36
6     GPHN     NEO  29.48444 13.692623     48887.0 0.8077158 4.291708e-32
7    CMSS1     NEO  58.76444 18.522809     43418.5 0.7173647 8.288666e-17
8    CMSS1     NEO  33.93677  9.080192    386584.5 0.6956537 2.241076e-36
9    CMSS1     NEO  28.52984  7.510545    113774.5 0.6615450 2.933183e-15
10   CMSS1     NEO  28.63731  8.067385    203700.0 0.6962008 2.334658e-27
11   CMSS1     NEO  20.09013  6.038908    319232.5 0.6745593 7.146759e-27
12  MALAT1     NEO  75.67647 42.574021    328727.5 0.6946229 5.706574e-34
13   CMSS1     NEO  21.71179  6.093442   1172275.5 0.6709183 1.980533e-51
14   CMSS1     NEO  32.62084  6.457997    744954.0 0.6247203 5.337960e-17
15   CMSS1     NEO  30.89175  8.981102    501512.0 0.7217599 3.238796e-50
16   CMSS1     NEO  26.30723  9.721572     88103.0 0.7048353 5.031976e-20
17     FAU     NEO  46.76305 12.639546     75466.5 0.6037417 3.482375e-06
18   CMSS1     NEO  32.04228  6.929776     63913.5 0.6495274 5.389344e-09
         padj.x  pct_in.x pct_out.x celltype group.y avgExpr.y  logFC.y
1  2.609872e-50  99.81185  99.82095       L2     NEO  44.07056 25.84567
2  4.002152e-48  99.79339  99.56044       M3     NEO  38.37190 13.06040
3  1.441166e-80  99.10468  99.67033       M3     NEO  20.84229 11.63054
4  5.089264e-18 100.00000  99.62825       L3     NEO  23.73778 13.37384
5  2.799521e-32 100.00000  98.51301       L3     NEO  15.74222 10.21435
6  3.635935e-28 100.00000  98.88476       L3     NEO  29.48444 17.08117
7  3.511079e-13 100.00000 100.00000       L3     NEO  58.76444 35.87920
8  1.265760e-32  99.36766  99.20319       L5     NEO  33.93677 15.89116
9  1.242496e-11  99.63834 100.00000       B1     NEO  28.52984 13.58794
10 1.318615e-23  99.87047  99.73615       DC     NEO  28.63731 14.15736
11 1.345496e-23  99.43074  98.44098       M4     NEO  20.09013 10.58342
12 3.223073e-30  81.59393  53.22940       M4     NEO  75.67647 17.17781
13 1.118605e-47  97.87898  99.26606       M1     NEO  21.71179 11.23514
14 7.664949e-15  99.74406  99.54128       M2     NEO  32.62084 14.67971
15 1.371954e-46 100.00000  99.80989       M8     NEO  30.89175 12.60470
16 4.263090e-16  99.59839  99.60159    Neut1     NEO  26.30723 14.63644
17 1.282725e-03  99.59839  98.00797    Neut1     NEO  46.76305 10.81802
18 9.223944e-07  99.67480 100.00000       L6     NEO  32.04228 12.92813
   statistic.y     auc.y        pval.y        padj.y  pct_in.y pct_out.y
1     982892.0 0.8848227 1.226684e-205 2.078494e-201  99.81185  99.90431
2    1677764.5 0.7071513  4.227048e-88  1.193718e-84  99.79339  99.32681
3    2089805.0 0.8808199 2.906263e-293 4.924372e-289  99.10468  99.26561
4      56616.0 0.8250055  1.476869e-37  6.256016e-34 100.00000  99.34426
5      60629.0 0.8834827  1.068388e-51  9.051381e-48 100.00000  96.72131
6      59891.5 0.8727359  7.992881e-49  4.514379e-45 100.00000  99.67213
7      61586.0 0.8974281  3.141033e-55  5.322167e-51 100.00000 100.00000
8     840013.0 0.8239081 1.128347e-139 1.911872e-135  99.36766  99.56569
9     248815.5 0.7921437  2.317253e-64  3.926354e-60  99.63834  99.47183
10    496041.0 0.8312292 1.297642e-112 2.198725e-108  99.87047  99.87063
11    950305.0 0.8071778 1.119818e-135 1.897419e-131  99.43074  97.67234
12    695588.5 0.5908247  1.072953e-13  1.515009e-10  81.59393  67.94987
13   2802830.5 0.8083637 1.160786e-230 1.966835e-226  97.87898  98.01202
14   3113471.0 0.7883522 4.053702e-207 6.868593e-203  99.74406  99.51524
15   1575695.0 0.7962648 7.367929e-163 1.248422e-158 100.00000  99.73298
16    652640.0 0.8094639  2.858738e-97  4.843846e-93  99.59839  97.89994
17    474496.0 0.5885134  2.183414e-09  3.894292e-07  99.59839  99.19704
18    194134.5 0.7832900  6.583250e-53  1.115466e-48  99.67480  99.75186
write.table(inner_join(DEG_NEO_ADJ_df_sig, DEG_NEO_CTRL_df_sig, by = c('feature','celltype')),
            "/scratch/hnatri/CART/PD1_CART_DEG_NEO_ADJ_CTRL_overlap.tsv",
            sep = "\t", quote = F, row.names = F)

# Overlap with Seurat
intersect(DEG_NEO_ADJ_Seurat_df_sig$feature, DEG_NEO_CTRL_Seurat_df_sig$feature)
 [1] "EEF1A1"  "COX3"    "ATP6"    "COX1"    "ACTG1"   "ND4"     "TMSB10" 
 [8] "RPS27"   "RPL30"   "RPLP2"   "FAU"     "GPHN"    "RPS29"   "HEXB"   
[15] "NAV3"    "LY86"    "C1QA"    "CST3"    "ACTB"    "PLXDC2"  "LARS2"  
[22] "RPS7"    "RPL19"   "RPS10"   "RPS16"   "RPL13"   "RPS14"   "RPS9"   
[29] "RPL11"   "RPL18A"  "RPL32"   "RPS8"    "RPL23"   "RPL27"   "RPL27A" 
[36] "RPS13"   "RPL34"   "CMSS1"   "PLAC8"   "RPL10.1" "RPL9"    "RPS3A"  
[43] "RPL18"   "FCER1G"  "RPS24"   "RPS18"   "RPS5"    "RPS11"   "RPS12"  
[50] "ITM2B"   "RPS23"   "IFITM3"  "RPS15A"  "RPL21"   "UBB"     "RPS28"  
[57] "RPS19"   "RPS21"   "RPLP1"   "RPL39"   "RPSA"    "RPL15"   "RPL22"  
[64] "RPL36"   "RPL6"    "RPL37A"  "RPL38"   "RPL35A"  "RPS20"   "RPL37"  
[71] "PTMA"    "CTSD"    "CD74"    "LYZ"     "C1QB"    "C1QC"    "RPL13A" 
[78] "GPNMB"   "CSTB"    "CAMK1D"  "CSMD3"   "RPS25"  
inner_join(DEG_NEO_ADJ_Seurat_df_sig, DEG_NEO_CTRL_Seurat_df_sig,
           by = c('feature','celltype'))
        p_val.x avg_log2FC.x pct.1.x pct.2.x  p_val_adj.x feature celltype
1  3.966681e-27    -259.6136   0.984   1.000 6.721145e-23   ACTG1       L2
2  2.751982e-13     278.5116   0.991   0.999 4.662959e-09   RPS27       L2
3  1.228817e-27     106.0853   1.000   1.000 2.082108e-23    HEXB       M3
4  2.364794e-22    -152.1571   0.925   0.947 4.006907e-18    NAV3       M3
5  1.166178e-19    -229.9925   1.000   1.000 1.975972e-15    LY86       M3
6  2.125357e-17    -100.2201   0.999   0.996 3.601205e-13    C1QA       M3
7  1.229275e-14     113.2988   0.999   0.998 2.082884e-10    ACTB       M3
8  1.129624e-11     359.9997   1.000   1.000 1.914035e-07  PLXDC2       M3
9  2.274650e-43     129.0535   0.944   0.925 3.854167e-39   LARS2       L1
10 5.786252e-30     103.9476   0.976   0.989 9.804225e-26    GPHN       L1
11 5.634798e-08     124.1453   0.991   0.989 9.547602e-04   CMSS1       L1
12 9.636771e-19     306.2082   0.887   0.909 1.632854e-14  RPL27A       M5
13 1.094706e-16     374.0149   0.852   0.937 1.854869e-12   ITM2B       M5
14 2.419081e-16     271.5835   0.781   0.897 4.098892e-12    CST3       M5
15 5.159923e-15     335.0621   0.927   0.937 8.742974e-11     FAU       M5
16 2.040173e-14     177.6219   0.741   0.853 3.456870e-10   RPS28       M5
17 1.454476e-13     320.0405   0.825   0.869 2.464464e-09   RPS21       M5
18 1.475223e-13     342.2756   0.927   0.925 2.499619e-09   RPLP1       M5
19 8.961443e-12     364.2642   0.893   0.913 1.518427e-07  RPL37A       M5
20 3.735451e-10    -117.6887   0.993   0.994 6.329348e-06    CD74       B1
21 3.224185e-30     249.7978   0.551   0.247 5.463059e-26   PLAC8       M4
22 7.146759e-27     129.0493   0.994   0.984 1.210947e-22   CMSS1       M4
23 3.247347e-38    -133.2941   0.997   0.999 5.502304e-34    C1QB       M1
24 3.680739e-10     189.8788   0.991   0.994 6.236644e-06  FCER1G       M1
25 1.632670e-08     139.3850   0.974   0.981 2.766396e-04   RPL38       M1
26 2.284859e-08     260.5714   0.990   0.993 3.871465e-04  RPL37A       M1
27 1.160680e-07     156.6973   0.984   0.985 1.966656e-03   RPL37       M1
28 3.085565e-12          Inf   0.552   0.372 5.228181e-08   GPNMB       M2
29 1.340781e-08     132.9642   0.986   0.968 2.271819e-04    CSTB       M2
30 3.238796e-50     425.2534   1.000   0.998 5.487815e-46   CMSS1       M8
31 3.613081e-09     213.9731   0.962   0.845 6.122005e-05  RPL37A    Neut1
32 8.027699e-09     293.3200   0.982   0.948 1.360213e-04   RPS27    Neut1
33 2.827164e-07     105.7710   0.855   0.618 4.790346e-03    RPSA    Neut1
34 4.782714e-07     182.2339   0.878   0.717 8.103831e-03   RPS20    Neut1
         p_val.y avg_log2FC.y pct.1.y pct.2.y   p_val_adj.y
1   8.473544e-08     198.8842   0.984   0.994  1.435757e-03
2   3.552264e-22     288.5140   0.991   0.996  6.018956e-18
3   1.686818e-14    -262.4001   1.000   1.000  2.858144e-10
4   4.565600e-29    -115.2452   0.925   0.965  7.735953e-25
5   4.264387e-12    -478.7343   1.000   1.000  7.225577e-08
6   1.160230e-12    -239.3170   0.999   0.998  1.965894e-08
7   8.332439e-16         -Inf   0.999   0.999  1.411848e-11
8   1.824855e-08     161.7522   1.000   1.000  3.092034e-04
9  1.744941e-144     126.8739   0.944   0.851 2.956628e-140
10 1.199379e-131     125.8426   0.976   0.968 2.032227e-127
11 3.754804e-158     195.3417   0.991   0.988 6.362139e-154
12  5.181638e-08     160.4532   0.887   0.887  8.779768e-04
13  1.150034e-07     257.1138   0.852   0.875  1.948618e-03
14  1.645691e-08    -328.6204   0.781   0.821  2.788460e-04
15  3.660547e-07     154.6825   0.927   0.919  6.202431e-03
16  5.985423e-08     125.3737   0.741   0.762  1.014170e-03
17  2.106299e-08     183.5364   0.825   0.823  3.568913e-04
18  1.149348e-07     120.0578   0.927   0.899  1.947456e-03
19  3.720020e-08     226.8172   0.893   0.889  6.303201e-04
20  2.630351e-08         -Inf   0.993   0.995  4.456866e-04
21  1.181236e-11     196.2202   0.551   0.389  2.001486e-07
22 1.119818e-135     138.5528   0.994   0.977 1.897419e-131
23  1.052123e-08    -151.0507   0.997   0.998  1.782718e-04
24  2.582953e-10     114.3955   0.991   0.989  4.376555e-06
25  5.101458e-20     156.2432   0.974   0.969  8.643910e-16
26  7.520665e-18     235.5916   0.990   0.989  1.274301e-13
27  3.141299e-09     189.4252   0.984   0.983  5.322617e-05
28  4.325826e-07          Inf   0.552   0.467  7.329679e-03
29  5.554667e-08    -312.5436   0.986   0.983  9.411827e-04
30 7.367929e-163     386.8237   1.000   0.997 1.248422e-158
31  3.809009e-12    -107.9439   0.962   0.839  6.453986e-08
32  1.086127e-14     104.1322   0.982   0.942  1.840334e-10
33  8.395429e-09    -145.4540   0.855   0.687  1.422522e-04
34  2.616479e-10    -195.9488   0.878   0.726  4.433362e-06
write.table(inner_join(DEG_NEO_ADJ_Seurat_df_sig, DEG_NEO_CTRL_Seurat_df_sig,
                       by = c('feature','celltype')),
            "/scratch/hnatri/CART/PD1_CART_DEG_NEO_ADJ_CTRL_overlap.tsv",
            sep = "\t", quote = F, row.names = F)

DEGs between treatment groups by timepoint

Comparing NEO vs. CTRL on Day 12 and NEO vs. CTRL, NEO vs. ADJ, and ADJ vs. CTRL on Day 16.

# A function for running the analysis
get_degs <- function(group1, group2, day){

  # For each cluster, top DEGs between group1 and group2 on day x
  # Using the Seurat function
  DEG_list <- lapply(unique(seurat_data$celltype), function(xx){
    data_subset <- subset(seurat_data, subset = celltype == xx)
    data_subset <- subset(data_subset, subset = Day == day)
    Idents(data_subset) <- data_subset$Treatment
    if (all((c(group1, group2) %in% data_subset$Treatment) == c(T, T))){
      markers <- FindMarkers(data_subset,
                             ident.1 = group1,
                             ident.2 = group2,
                             assay = "RNA_human",
                             verbose = F)
      markers$feature <- rownames(markers)
      markers$celltype <- xx
      
      return(markers)
    } else {
      return(NULL)
    }
  })
  names(DEG_list) <- unique(seurat_data$celltype)
  DEG_list[sapply(DEG_list, is.null)] <- NULL
  
  DEG_df <- as.data.frame(do.call(rbind, DEG_list))
  
  DEG_df_sig <- DEG_df %>%
    filter(p_val_adj < 0.01,
           abs(avg_log2FC) > 100,
           (pct.1 > 0.50 | pct.2 > 0.50))
  
  # Saving to a file
  filename <- paste0("/scratch/hnatri/CART/PD1_CART_DEG_", group1, "_", group2,
                     "_", day, "_Seurat_sig.tsv")
  write.table(DEG_df_sig, filename,
              sep = "\t", quote = F, row.names = F)
  
  return(DEG_df_sig)
}

# For each comparison and timepoint, calling DEGs and plotting
DEG_NEO_CTRL_D12 <- get_degs(group1 = "NEO",
                             group2 = "CTRL", 
                             day = 12)

DEG_NEO_CTRL_D16 <- get_degs(group1 = "NEO",
                             group2 = "CTRL", 
                             day = 16)

DEG_NEO_ADJ_D16 <- get_degs(group1 = "NEO",
                            group2 = "ADJ", 
                            day = 16)

DEG_ADJ_CTRL_D16 <- get_degs(group1 = "ADJ",
                             group2 = "CTRL", 
                             day = 16)

length(unique(DEG_NEO_CTRL_D12$feature))
[1] 47
length(unique(DEG_NEO_CTRL_D16$feature))
[1] 87
length(unique(DEG_NEO_ADJ_D16$feature))
[1] 104
length(unique(DEG_ADJ_CTRL_D16$feature))
[1] 104
# Plotting
p1 <- table(DEG_NEO_CTRL_D12$celltype) %>% as.data.frame() %>%
  ggplot(aes(x = reorder(Var1, -Freq), y = Freq, fill = Var1)) +
    geom_bar(stat = "identity") +
    scale_fill_manual(values = PD1_Kluc_celltype_col) +
    theme_classic2() +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
    NoLegend() +
    xlab("Cell type") +
    ylab("# DEGs NEO vs. CTRL, Day 12")

p2 <- table(DEG_NEO_CTRL_D16$celltype) %>% as.data.frame() %>%
  ggplot(aes(x = reorder(Var1, -Freq), y = Freq, fill = Var1)) +
    geom_bar(stat = "identity") +
    scale_fill_manual(values = PD1_Kluc_celltype_col) +
    theme_classic2() +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
    NoLegend() +
    xlab("Cell type") +
    ylab("# DEGs NEO vs. CTRL, Day 16")

p3 <- table(DEG_NEO_ADJ_D16$celltype) %>% as.data.frame() %>%
  ggplot(aes(x = reorder(Var1, -Freq), y = Freq, fill = Var1)) +
    geom_bar(stat = "identity") +
    scale_fill_manual(values = PD1_Kluc_celltype_col) +
    theme_classic2() +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
    NoLegend() +
    xlab("Cell type") +
    ylab("# DEGs NEO vs. ADJ, Day 16")

p4 <- table(DEG_ADJ_CTRL_D16$celltype) %>% as.data.frame() %>%
  ggplot(aes(x = reorder(Var1, -Freq), y = Freq, fill = Var1)) +
    geom_bar(stat = "identity") +
    scale_fill_manual(values = PD1_Kluc_celltype_col) +
    theme_classic2() +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
    NoLegend() +
    xlab("Cell type") +
    ylab("# DEGs ADJ vs. CTRL, Day 16")

# Top DEGs for each cluster
#DEG_df_sig %>% group_by(celltype) %>% slice_max(order_by = avg_log2FC, n = 10
(p1 | p2) / (p3 | p4)

Version Author Date
7c73c80 heinin 2024-05-06

Looking at DEGs unique to/shared between groups

# Overlap of significant DEGs in each comparison
input_list <- list("NEO vs. CTRL, Day 12" = DEG_NEO_CTRL_D12$feature,
                   "NEO vs. CTRL, Day 16" = DEG_NEO_CTRL_D16$feature,
                   "NEO vs. ADJ, Day 16" = DEG_NEO_ADJ_D16$feature,
                   "ADJ vs. CTRL, Day 16" = DEG_ADJ_CTRL_D16$feature)

upset(fromList(input_list), order.by = "freq")

# Plotting overlaps without RP and MT genes
RBMTgenes <- grep(pattern = "^RP[SL]|^MRP[SL]|^MT-", x = rownames(seurat_data@assays$RNA_human@data), value = TRUE)
    
input_list <- lapply(input_list, function(xx){
  setdiff(xx, RBMTgenes)
})

upset(fromList(input_list), order.by = "freq")

# Looking at genes in intersections
# Unique to ADJ_CTRL_D16
setdiff(DEG_ADJ_CTRL_D16$feature, c(DEG_NEO_ADJ_D16$feature,
                                    DEG_NEO_CTRL_D16$feature,
                                    DEG_NEO_CTRL_D12$feature,
                                    RBMTgenes))
 [1] "LGALS1" "SSH2"   "CACNB2" "CBLB"   "RACK1"  "EEF2"   "CELF2"  "FUT8"  
 [9] "ETV6"   "CCL17" 
# Unique to DEG_NEO_CTRL_D16
setdiff(DEG_NEO_CTRL_D16$feature, c(DEG_NEO_ADJ_D16$feature,
                                    DEG_ADJ_CTRL_D16$feature,
                                    DEG_NEO_CTRL_D12$feature,
                                    RBMTgenes))
 [1] "UBB"      "APOE"     "GLUL"     "ZFHX3"    "CSF1R"    "ITGB5"   
 [7] "APBB1IP"  "ENTPD1"   "EPB41L2"  "DOCK4"    "TOX"      "ZFP36L1" 
[13] "BTG2"     "TYROBP"   "CREG1"    "CEBPB"    "S100A6"   "CTSC"    
[19] "GADD45B"  "RETNLB.2" "STAT1"    "IER3"     "INPP5D"  
# Unique to DEG_NEO_ADJ_D16
setdiff(DEG_NEO_ADJ_D16$feature, c(DEG_NEO_CTRL_D16$feature,
                                   DEG_ADJ_CTRL_D16$feature,
                                   DEG_NEO_CTRL_D12$feature,
                                   RBMTgenes))
 [1] "NPM1"     "SATB1"    "SKAP1"    "FOXP1"    "ARHGAP15" "NAV3"    
 [7] "PLXDC2"   "MERTK"    "TGFBR1"   "C1QB"     "PFN1"     "SPP1"    
[13] "CCL2"     "LPL"      "PRDX5"    "TXN"      "GZMB.2"   "SH3BGRL3"
[19] "S100A10"  "ISG15"    "CD63"     "ND2"      "TGFBI"    "SRGN"    
# Unique to DEG_NEO_ADJ_D16
setdiff(DEG_NEO_CTRL_D12$feature, c(DEG_NEO_CTRL_D16$feature,
                                    DEG_ADJ_CTRL_D16$feature,
                                    DEG_NEO_ADJ_D16$feature,
                                    RBMTgenes))
 [1] "GZMA"     "TMCC3"    "RABGAP1L" "PDE4D"    "GAPDH"    "IFITM3"  
 [7] "VIM"      "CD274"    "SLPI"     "ALCAM"   
# Shared between NEO vs. ADJ and ADJ vs. CTRL, without RB/MT genes
setdiff(intersect(DEG_NEO_ADJ_D16$feature, DEG_ADJ_CTRL_D16$feature), RBMTgenes)
 [1] "EEF1A1" "COX3"   "ATP6"   "ACTG1"  "PPIA"   "ND4"    "COX1"   "B2M"   
 [9] "MYL6"   "CCL5"   "GPHN"   "HEXB"   "LY86"   "CST3"   "CMSS1"  "FTH1"  
[17] "ACTB"   "CYTB"   "LY6E"   "TANC2"  "FAU"    "C1QC"   "FTL"    "ITM2B" 
[25] "HSPA8"  "CTSB"   "CTSD"   "ND1"    "PSAP"   "PID1"  

Plotting SPP1

# Normalizing all features in the Seurat object for plotting
seurat_data <- NormalizeData(seurat_data, assay = "RNA", verbose = F)
seurat_data <- NormalizeData(seurat_data, assay = "RNA_human", verbose = F)

# Looking at SPP1
DEG_NEO_ADJ_D16 %>% filter(feature == "SPP1")
               p_val avg_log2FC pct.1 pct.2    p_val_adj feature celltype
M4.SPP1 1.233403e-09        Inf 0.639 0.488 2.089878e-05    SPP1       M4
VlnPlot(seurat_data,
        features = "SPP1",
        group.by = "celltype",
        split.by = "Treatment_Day",
        assay = "RNA_human",
        layer = "data",
        pt.size = 0,
        log = T,
        cols = treatment_day_col)
The default behaviour of split.by has changed.
Separate violin plots are now plotted side-by-side.
To restore the old behaviour of a single split violin,
set split.plot = TRUE.
      
This message will be shown once per session.

Human T cell canonical markers

# Canonical T cell markers
gs4_deauth()
canonical_markers  <- gs4_get("https://docs.google.com/spreadsheets/d/186PIcU3Jb0Xm3IuZgENEChluykvwsK4JnHF8FbeJASA/edit?usp=sharing")
sheet_names(canonical_markers)
 [1] "T cells"                          "T cells, gene sets"              
 [3] "Sheet12"                          "13384 product gene sets"         
 [5] "Luo et al."                       "Exhaustion"                      
 [7] "Macrophage gene sets"             "Tumor"                           
 [9] "Angela T cells"                   "Pediatric_T_cells"               
[11] "Myeloid/Lymphoid (excl. T cells)" "Info"                            
canonical_markers <- read_sheet(canonical_markers, sheet = "T cells, gene sets")
✔ Reading from "CAR T project cell type markers".
✔ Range ''T cells, gene sets''.
head(canonical_markers)
# A tibble: 6 × 7
  RNA   Protein Marker_type Gene_sets Bigger_gene_sets Protein_type Note 
  <chr> <chr>   <chr>       <chr>     <chr>            <chr>        <chr>
1 CD8A  CD8     Tc          <NA>      Cytotoxic        receptor     <NA> 
2 CD8B  CD8     Tc          <NA>      Cytotoxic        receptor     <NA> 
3 CD8B2 CD8     Tc          <NA>      Cytotoxic        receptor     <NA> 
4 GNLY  <NA>    Tc          <NA>      Cytotoxic        other        <NA> 
5 GZMB  <NA>    Tc          <NA>      Cytotoxic        other        <NA> 
6 GZMH  <NA>    Tc/TNK      <NA>      Cytotoxic        other        <NA> 
# Overlapping with cluster markers
sig_markers %>% filter(feature %in% canonical_markers$RNA)
    feature group   avgExpr     logFC statistic       auc          pval
1      BTLA    B1  2.819832  2.451013  53270252 0.8333870  0.000000e+00
2      SELL   ILC  4.937947  3.930917  31398107 0.8283679  0.000000e+00
3      ICOS    L1  4.471437  3.884835 119733465 0.8108617  0.000000e+00
4      CD3D    L1  6.326601  4.223792 121737508 0.8244336  0.000000e+00
5      CD3E    L1  4.371321  3.000948 122100686 0.8268931  0.000000e+00
6     STAT4    L2  5.501705  3.662848 112624549 0.8151046  0.000000e+00
7      NKG7    L2 15.941395 12.806511 123419183 0.8932293  0.000000e+00
8      CD3D    L2 10.119070  8.277972 123397668 0.8930735  0.000000e+00
9      CD3E    L2  6.381705  5.145780 121161490 0.8768895  0.000000e+00
10    IL2RB    L2  6.552868  4.819393 117359040 0.8493698  0.000000e+00
11     CD28    L3  4.533166  3.924949  30926318 0.8550095  0.000000e+00
12    CTLA4    L3  7.095119  6.309477  29457939 0.8144137  0.000000e+00
13    IKZF2    L3 29.190238 25.295517  29100812 0.8045404 5.421193e-247
14      TOX    L3 16.262829 14.292937  29105263 0.8046634 1.170115e-305
15  TNFRSF9    L3  6.833542  6.255278  29718887 0.8216281  0.000000e+00
16 TNFRSF18    L3  5.161452  4.512479  30427185 0.8412102  0.000000e+00
17     NKG7    L3 26.106383 22.464611  31283052 0.8648720  0.000000e+00
18    ITGAL    L3  5.914894  3.857129  29504929 0.8157128 5.638105e-223
19    MKI67    L3 18.627034 18.048129  34333974 0.9492198  0.000000e+00
20     CD3D    L3 21.670839 19.590012  34159134 0.9443861  0.000000e+00
21     CD3E    L3 13.697121 12.314550  33808550 0.9346936  0.000000e+00
22    IL2RB    L3 13.021277 11.143698  33106946 0.9152966  0.000000e+00
23    IL2RB    L5  6.216996  4.387073  89836868 0.8155595  0.000000e+00
24    PTPRC    L6 30.775042 17.970627  43098701 0.8150038 1.061915e-299
25     LAG3    M1  5.020593  3.908881 167653373 0.8377210  0.000000e+00
26    MKI67    M7 12.376543 11.607068  21543860 0.9724882  0.000000e+00
27   ENTPD1    M7 10.506173  6.251626  18010500 0.8129926 5.791522e-129
28     NKG7    NK 16.894083 13.675464 108545254 0.9149106  0.000000e+00
29     PRF1    NK  5.254931  4.989393 108589986 0.9152877  0.000000e+00
30     GZMA    NK 52.561359 49.296860 113350922 0.9554169  0.000000e+00
31    IL2RB    NK  7.514974  5.788128 105529013 0.8894872  0.000000e+00
32    STAT4  Treg 14.828969 13.080669  48497993 0.8849509  0.000000e+00
33     CCR7  Treg 17.340426 17.222959  52537460 0.9586597  0.000000e+00
            padj   pct_in   pct_out
1   0.000000e+00 77.16480 13.300625
2   0.000000e+00 85.79952 31.487254
3   0.000000e+00 74.23543 16.069291
4   0.000000e+00 90.30583 25.540455
5   0.000000e+00 89.03635 22.564139
6   0.000000e+00 92.06202 35.491551
7   0.000000e+00 98.13953 27.842872
8   0.000000e+00 94.38760 25.597517
9   0.000000e+00 92.62016 22.668285
10  0.000000e+00 94.82171 30.328634
11  0.000000e+00 85.73217 22.122819
12  0.000000e+00 78.22278 22.997570
13 7.227120e-246 82.97872 38.639275
14 2.626016e-304 79.47434 27.075326
15  0.000000e+00 75.46934 15.754363
16  0.000000e+00 84.73091 24.996687
17  0.000000e+00 88.98623 31.771593
18 6.215489e-222 95.86984 58.756351
19  0.000000e+00 93.99249 13.569693
20  0.000000e+00 96.12015 29.253369
21  0.000000e+00 95.11890 26.372874
22  0.000000e+00 98.87359 33.713276
23  0.000000e+00 90.63241 31.601553
24 2.949686e-297 99.23599 94.520060
25  0.000000e+00 90.03295 32.844006
26  0.000000e+00 99.58848 14.062260
27 1.225113e-127 99.58848 69.786982
28  0.000000e+00 99.63477 28.538460
29  0.000000e+00 88.64134  9.438970
30  0.000000e+00 96.20161 16.288569
31  0.000000e+00 98.39299 30.827814
32  0.000000e+00 92.47136 38.007002
33  0.000000e+00 92.88052  5.779651
canonical_markers %>% filter(RNA %in% sig_markers$feature) %>% as.data.frame()
        RNA      Protein                       Marker_type   Gene_sets
1      NKG7         <NA>                                Tc        <NA>
2      PRF1         <NA>                                Tc        <NA>
3     CTLA4 CD152-CTLA-4                     Trm/Tregs/Tex Dysfunction
4      LAG3  CD223-LAG-3                               Tex Dysfunction
5  TNFRSF18   CD357-GITR                               Tex        <NA>
6    ENTPD1         CD39                               Tex        <NA>
7       TOX         <NA>                               Tex        <NA>
8     IL2RB CD122-IL-2RB            Tact/Tscm/Tcm/Tem/Teff        <NA>
9      CCR7   CD197-CCR7                    Tnmem/Tscm/Tcm        <NA>
10     CD28         CD28           Tnmem/Tscm/Tcm/Tem/Teff        <NA>
11    PTPRC       CD45RA Tnmem/resting_Tregs/Tscm/Tem/Teff        <NA>
12    PTPRC       CD45RO                         Tmem/Tact        <NA>
13     SELL        CD62L                    Tnmem/Tscm/Tcm        <NA>
14    STAT4         <NA>                               Th1      Memory
15    ITGAL        CD11a             Tem/Teff/Tcm/Trm/Tscm        <NA>
16  TNFRSF9  CD137-4-1BB                              Tact        <NA>
17     BTLA   CD272-BTLA                               Tex        <NA>
18     ICOS   CD278-ICOS                          Tact,Th2        <NA>
19     CD3E          CD3                              PanT        <NA>
20     CD3D          CD3                              PanT        <NA>
21     GZMA         <NA>                          Tem/Teff        <NA>
22    MKI67         <NA>                              Tact        <NA>
23    CTLA4         <NA>                              Treg        <NA>
24    IKZF2         <NA>                              Treg        <NA>
   Bigger_gene_sets Protein_type                       Note
1         Cytotoxic        other                       <NA>
2         Cytotoxic        other                       <NA>
3       Dysfunction     receptor                       <NA>
4       Dysfunction     receptor                       <NA>
5       Dysfunction     receptor                       <NA>
6       Dysfunction        other                       <NA>
7       Dysfunction           tf                       <NA>
8            Memory     receptor                       <NA>
9            Memory     receptor                       <NA>
10           Memory     receptor                       <NA>
11           Memory     receptor             Splice variant
12           Memory     receptor             Splice variant
13           Memory     receptor                       <NA>
14           Memory           tf                       <NA>
15             <NA>     receptor                       <NA>
16             <NA>     receptor                       <NA>
17             <NA>     receptor                       <NA>
18             <NA>     receptor                       <NA>
19             <NA>     receptor NK cells don't express CD3
20             <NA>     receptor NK cells don't express CD3
21             <NA>        other                       <NA>
22             <NA>        other                       <NA>
23             Treg         <NA>                       <NA>
24             Treg         <NA>                       <NA>
# Overlapping with DEGs
DEG_NEO_CTRL_df_sig %>% filter(feature %in% canonical_markers$RNA)
         feature group  avgExpr    logFC statistic       auc         pval
L3.20167     TOX   NEO 25.27111 14.89078   42957.5 0.6259745 6.206806e-07
                padj   pct_in  pct_out celltype
L3.20167 0.000876401 83.55556 77.04918       L3
DEG_NEO_ADJ_df_sig %>% filter(feature %in% canonical_markers$RNA)
            feature group  avgExpr    logFC statistic       auc         pval
Neut1.22803   FOXP1   NEO 4.574297 1.948799   76120.5 0.6089737 4.691125e-07
L6.31902        TNF   NEO 1.978862 1.285112   63298.0 0.6432724 2.170169e-09
                    padj   pct_in  pct_out celltype
Neut1.22803 3.001419e-04 66.86747 49.40239    Neut1
L6.31902    4.430283e-07 55.93496 31.87500       L6
# Overlapping with DEGs from the Seurat analysis
DEG_NEO_CTRL_Seurat_df_sig %>% filter(feature %in% canonical_markers$RNA)
[1] p_val      avg_log2FC pct.1      pct.2      p_val_adj  feature    celltype  
<0 rows> (or 0-length row.names)
DEG_NEO_ADJ_Seurat_df_sig %>% filter(feature %in% canonical_markers$RNA)
                p_val avg_log2FC pct.1 pct.2    p_val_adj feature celltype
L2.FOXP1 2.494498e-10   190.5059 0.937 0.965 4.226677e-06   FOXP1       L2
L2.PTPRC 2.728027e-10   101.0505 0.999 0.999 4.622369e-06   PTPRC       L2
# To build on command line, run Rscript -e "rmarkdown::render('/home/hnatri/PD1_mm/analysis/comparative_analysis.Rmd')"
# Then "mv *.html /home/hnatri/PD1_mm/docs/"

sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] ComplexHeatmap_2.18.0       viridis_0.6.3              
 [3] viridisLite_0.4.2           circlize_0.4.15            
 [5] plyr_1.8.8                  RColorBrewer_1.1-3         
 [7] UpSetR_1.4.0                scProportionTest_0.0.0.9000
 [9] googlesheets4_1.1.0         workflowr_1.7.1            
[11] ggrepel_0.9.3               ggpubr_0.6.0               
[13] lubridate_1.9.2             forcats_1.0.0              
[15] stringr_1.5.0               dplyr_1.1.2                
[17] purrr_1.0.1                 readr_2.1.4                
[19] tidyr_1.3.0                 tibble_3.2.1               
[21] ggplot2_3.4.2               tidyverse_2.0.0            
[23] SeuratDisk_0.0.0.9021       Seurat_5.0.1               
[25] SeuratObject_5.0.1          sp_1.6-1                   

loaded via a namespace (and not attached):
  [1] RcppAnnoy_0.0.20       splines_4.3.0          later_1.3.1           
  [4] cellranger_1.1.0       polyclip_1.10-4        fastDummies_1.7.3     
  [7] lifecycle_1.0.3        rstatix_0.7.2          doParallel_1.0.17     
 [10] rprojroot_2.0.3        globals_0.16.2         processx_3.8.1        
 [13] lattice_0.21-8         hdf5r_1.3.8            MASS_7.3-60           
 [16] backports_1.4.1        magrittr_2.0.3         limma_3.58.1          
 [19] plotly_4.10.2          sass_0.4.6             rmarkdown_2.22        
 [22] jquerylib_0.1.4        yaml_2.3.7             httpuv_1.6.11         
 [25] sctransform_0.4.1      spam_2.9-1             spatstat.sparse_3.0-1 
 [28] reticulate_1.29        cowplot_1.1.1          pbapply_1.7-0         
 [31] abind_1.4-5            Rtsne_0.16             presto_1.0.0          
 [34] BiocGenerics_0.48.1    git2r_0.32.0           S4Vectors_0.40.2      
 [37] IRanges_2.36.0         irlba_2.3.5.1          listenv_0.9.0         
 [40] spatstat.utils_3.0-3   goftest_1.2-3          RSpectra_0.16-1       
 [43] spatstat.random_3.1-5  fitdistrplus_1.1-11    parallelly_1.36.0     
 [46] leiden_0.4.3           codetools_0.2-19       tidyselect_1.2.0      
 [49] shape_1.4.6            farver_2.1.1           stats4_4.3.0          
 [52] matrixStats_1.0.0      spatstat.explore_3.2-1 googledrive_2.1.0     
 [55] jsonlite_1.8.5         GetoptLong_1.0.5       ellipsis_0.3.2        
 [58] progressr_0.13.0       iterators_1.0.14       ggridges_0.5.4        
 [61] survival_3.5-5         foreach_1.5.2          tools_4.3.0           
 [64] ica_1.0-3              Rcpp_1.0.10            glue_1.6.2            
 [67] gridExtra_2.3          xfun_0.39              withr_2.5.0           
 [70] fastmap_1.1.1          fansi_1.0.4            callr_3.7.3           
 [73] digest_0.6.31          timechange_0.2.0       R6_2.5.1              
 [76] mime_0.12              colorspace_2.1-0       Cairo_1.6-0           
 [79] scattermore_1.2        tensor_1.5             spatstat.data_3.0-1   
 [82] utf8_1.2.3             generics_0.1.3         data.table_1.14.8     
 [85] httr_1.4.6             htmlwidgets_1.6.2      whisker_0.4.1         
 [88] uwot_0.1.14            pkgconfig_2.0.3        gtable_0.3.3          
 [91] lmtest_0.9-40          htmltools_0.5.5        carData_3.0-5         
 [94] dotCall64_1.0-2        clue_0.3-64            scales_1.2.1          
 [97] png_0.1-8              knitr_1.43             rstudioapi_0.14       
[100] rjson_0.2.21           tzdb_0.4.0             reshape2_1.4.4        
[103] curl_5.0.0             nlme_3.1-162           cachem_1.0.8          
[106] zoo_1.8-12             GlobalOptions_0.1.2    KernSmooth_2.23-21    
[109] vipor_0.4.5            parallel_4.3.0         miniUI_0.1.1.1        
[112] ggrastr_1.0.2          pillar_1.9.0           vctrs_0.6.2           
[115] RANN_2.6.1             promises_1.2.0.1       car_3.1-2             
[118] xtable_1.8-4           cluster_2.1.4          beeswarm_0.4.0        
[121] evaluate_0.21          magick_2.7.4           cli_3.6.1             
[124] compiler_4.3.0         rlang_1.1.1            crayon_1.5.2          
[127] future.apply_1.11.0    ggsignif_0.6.4         labeling_0.4.2        
[130] ps_1.7.5               ggbeeswarm_0.7.2       getPass_0.2-4         
[133] fs_1.6.2               stringi_1.7.12         deldir_1.0-9          
[136] munsell_0.5.0          lazyeval_0.2.2         spatstat.geom_3.2-1   
[139] Matrix_1.6-5           RcppHNSW_0.5.0         hms_1.1.3             
[142] patchwork_1.1.2        bit64_4.0.5            future_1.32.0         
[145] statmod_1.5.0          shiny_1.7.4            highr_0.10            
[148] ROCR_1.0-11            gargle_1.4.0           igraph_1.4.3          
[151] broom_1.0.4            bslib_0.4.2            bit_4.0.5